Skip to content

feat: Add NETopKV function.#1251

Merged
morgolock merged 1 commit intomainfrom
pr/cpu_topkv
Jan 28, 2026
Merged

feat: Add NETopKV function.#1251
morgolock merged 1 commit intomainfrom
pr/cpu_topkv

Conversation

@morgolock
Copy link
Contributor

  • The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (scalar CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.

  • Resolves ARMCL-1227

Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705

@morgolock morgolock force-pushed the pr/cpu_topkv branch 4 times, most recently from 89cd2b8 to cd41038 Compare January 16, 2026 16:25
@morgolock morgolock force-pushed the pr/cpu_topkv branch 4 times, most recently from e0090dc to 11238b6 Compare January 20, 2026 19:06
@morgolock morgolock force-pushed the pr/cpu_topkv branch 2 times, most recently from b0f0bdb to b94b4e9 Compare January 27, 2026 11:17
@morgolock morgolock requested a review from gunes-arm January 28, 2026 13:06
@morgolock morgolock force-pushed the pr/cpu_topkv branch 3 times, most recently from c0541a1 to 7b32d41 Compare January 28, 2026 15:34
* The Neon(TM) implementation of TopKV reduces execution time from 447.8 ms (CPP) to 11.65 ms for the same workload (F32, C=1000, N=32000, k=3, 6 threads), achieving an approximate 38× speedup. This gain comes from SIMD vectorization, removal of per-element branches, and a more efficient inner loop.

* Resolves ARMCL-1227

Change-Id: Ifdf161ce4254dc5ecd57aff9ae22410facd31705
Signed-off-by: Pablo Marquez Tello <pablo.tello@arm.com>
@morgolock morgolock merged commit 9c9151c into main Jan 28, 2026
2 checks passed
@morgolock morgolock deleted the pr/cpu_topkv branch January 28, 2026 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants